This is the interactive version of the Liston-Dooley lab’s flowcytoscript analysis tool. In this R Markdown notebook, we’ll try to help you analyze your high parameter flow cytometry data in a way that’s hopefully easier if you’re not skilled in R. We’ve optimized several of the parameters for the analysis (e.g., clustering, tSNE, UMAP) already. If you want to get into more detail or change the appearance of the plots, have a look at “flowcytoscript_graphics_parameters_src.r”, or the more complex version of the script.
Please review the instructions document before proceeding. The script is intended to analyze data files containing cells that have been pre-gated to remove debris, dead cells and, usually, to select a cell type of interest. We recommend exporting your cytometry files in the “CSV - channel values” format so the scaling (biexponetial transform) is preserved. You can also use exported FCS files, but in this case your data will be transformed automatically by the script based on the type of cytometer you used. The data needs to be in a folder called “Data”, and this “Data” folder should be in the same folder with this notebook and the “source_files” folder.
This will probably run more smoothly in a local folder (not Dropbox or OneDrive).
Execute each code chunk in order by clicking the Run button (green arrow) within the chunk. Some results will appear below, and the record of what you’ve done will be generated as an html file (flowcytoscript.nb) that you can open with any web browser. Rename the file .rmd prior to starting if you want to associate the name with a particular experiment.
Once finished, you should see a report generated in your html notebook.
If this is your first time running the script, you should probably run flowcytoscript_setup.r first. That will help you update R and install Rtools.
To change how long the messages are displayed for, change message.delay.time <- 3 to a bigger (slower) or smaller (faster) number.
To eliminate messages and just respond to prompts, set Be.Chatty <- TRUE to FALSE rather than TRUE in the gray box below.
This section installs any packages you will need for the analysis. You’ll get a warning if anything fails to install properly.
fcs.data.dir <- "./CD8"
fcs.src.dir <- "./00_source_files"
message.delay.time <- 0
Be.Chatty <- TRUE
source( file.path( fcs.src.dir, "flowcytoscript_startup.r") )
Welcome to flowcytoscript!
This simplified version of the Liston Lab flow cytometry analysis
pipeline will try to take care of as much as possible.
We're going to have you tell us what your groups are,
which markers you want to analyze, and how many cells
you want to work with.
After that, we'll try to cluster
your data, and provide you with visualizations in the forms
of tSNE, UMAP, PCA, heatmaps and barcharts.
For best results, make sure your R and Rtools are up-to-date.
If you can do this yourself, that may work better, particularly
for non-Windows users.
Alternatively, you can run the flowcytoscript_setup.R script
in the source_files folder.
Now we're going to try to install any of the required packages
that you don't already have installed.
That's all done. Now, on to the analysis!
source( file.path( fcs.src.dir, "flowcytoscript_load_runchecks.r") )
Loading packages...
Loading required package: dunn.test
Loading required package: Matrix
Attaching package: ‘dplyr’
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
Attaching package: ‘tidyr’
The following objects are masked from ‘package:Matrix’:
expand, pack, unpack
Loading required package: RcppHNSW
Attaching package: ‘flowCore’
The following object is masked from ‘package:Matrix’:
%&%
Registered S3 method overwritten by 'data.table':
method from
print.data.table
As part of improvements to flowWorkspace, some behavior of
GatingSet objects has changed. For details, please read the section
titled "The cytoframe and cytoset classes" in the package vignette:
vignette("flowWorkspace-Introduction", "flowWorkspace")
data.table 1.14.8 using 4 threads (see ?getDTthreads). Latest news: r-datatable.com
Attaching package: ‘data.table’
The following objects are masked from ‘package:dplyr’:
between, first, last
If there are no error messages, then the packages are loaded and source code has been located.
Warnings about packages being built under a slight different R version are usually not a problem.
Proceed to the next step.
Set your groups, select your channels, select how many cells to run.
FCS and CSV files are accepted. For CSV, you should use CSV-Channel Values files exported from FlowJo after setting the biexponential transformation for each channel. For FCS files, a biexponential transformation will be applied automatically based on the type of cytometer you’ve used.
If you make any mistakes here, you can re-run this chunk and try again.
source( file.path( fcs.src.dir, "flowcytoscript_define_experiment.r") )
This part of the workflow requires your input.
Let's start by defining the groups in your experiment.
For this to work, the data files need to be named with the identifying
label for the group, and those tags need to be unique to each group.
Please type the tags (names) exactly as they appear in the files.
In the following step, you'll get the opportunity to assign new names
for each group, which will appear on the plots in the end.
8
Blood_WT
Blood_Areg
Lymphoid_WT
Lymphoid_KO
Tissue_WT
Tissue_KO
GALT_WT
GALT_KO
Do you need to correct any of the group names?
1 Blood_WT
2 Blood_Areg
3 Lymphoid_WT
4 Lymphoid_KO
5 Tissue_WT
6 Tissue_KO
7 GALT_WT
8 GALT_KO
Enter relevant number(s), separated by commas
Ranges such as 3:7 may be specified)
(Enter 0 for none)
0
Do you want to enter different names for the groups? These labels will appear on the plots.
1: Yes
2: No
1
WT Blood
AregKO Blood
WT Lymphoid
AregKO Lymphoid
WT Tissue
AregKO Tissue
WT GALT
AregKO GALT
Your groups will be labeled as follows:
Blood_WT Blood_Areg Lymphoid_WT Lymphoid_KO
"WT Blood" "AregKO Blood" "WT Lymphoid" "AregKO Lymphoid"
Tissue_WT Tissue_KO GALT_WT GALT_KO
"WT Tissue" "AregKO Tissue" "WT GALT" "AregKO GALT"
We recommend using CSV files with the biexponential transformation already embedded
in the data. To create these CSV-channel-value files, see the instructions.
If you plan to use FCS files, you'll need to transform the data in the next steps.
Please select whether you are using CSV or FCS files.
1: CSV
2: FCS
1
Now you'll need to select the markers (channels) you want to use for your analysis.
Enter the numbers of the channels you want. You may need to expand the console window
in order to see everything.
Please select channels for analysis:
Channel Marker
1 FSC.A
2 FSC.H
3 SSC.A
4 SSC.B.A
5 SSC.B.H
6 SSC.H
7 Comp.AF.A
8 Foxp3
9 T.bet
10 IRF4
11 Ki67
12 CCR9
13 CD95
14 Ly.6C
15 CD103
16 CD4
17 NK1.1
18 CTLA.4
19 CD19
20 CD62L
21 CD8
22 CXCR6
23 CCR2
24 CD44
25 ICOS
26 RORgT
27 PD.1
28 CXCR3
29 viability
30 TNFRII
31 CD69
32 CD25
33 ST2
34 GATA.3
35 Neuropilin
36 injected.CD45
37 CD3
38 KLRG1
39 Helios
40 Time
Enter relevant number(s), separated by commas
Ranges such as 3:7 may be specified)
(Enter 0 for none)
8:15,17:18,20,22:28,30:35,38:39
These are the channels you've selected:
Foxp3
T.bet
IRF4
Ki67
CCR9
CD95
Ly.6C
CD103
NK1.1
CTLA.4
CD62L
CXCR6
CCR2
CD44
ICOS
RORgT
PD.1
CXCR3
TNFRII
CD69
CD25
ST2
GATA.3
Neuropilin
KLRG1
Helios
Do you need to change your channel selection?
1: Yes
2: No
2
Next you'll have the option to rename the marker labels.
To rename any channels, select them now
1 Foxp3
2 T.bet
3 IRF4
4 Ki67
5 CCR9
6 CD95
7 Ly.6C
8 CD103
9 NK1.1
10 CTLA.4
11 CD62L
12 CXCR6
13 CCR2
14 CD44
15 ICOS
16 RORgT
17 PD.1
18 CXCR3
19 TNFRII
20 CD69
21 CD25
22 ST2
23 GATA.3
24 Neuropilin
25 KLRG1
26 Helios
Enter relevant number(s), separated by commas
Ranges such as 3:7 may be specified)
(Enter 0 for none)
0
This is how your channels will be labeled:
Foxp3 T.bet IRF4 Ki67 CCR9 CD95
"Foxp3" "T.bet" "IRF4" "Ki67" "CCR9" "CD95"
Ly.6C CD103 NK1.1 CTLA.4 CD62L CXCR6
"Ly.6C" "CD103" "NK1.1" "CTLA.4" "CD62L" "CXCR6"
CCR2 CD44 ICOS RORgT PD.1 CXCR3
"CCR2" "CD44" "ICOS" "RORgT" "PD.1" "CXCR3"
TNFRII CD69 CD25 ST2 GATA.3 Neuropilin
"TNFRII" "CD69" "CD25" "ST2" "GATA.3" "Neuropilin"
KLRG1 Helios
"KLRG1" "Helios"
Now we'll match the data files to the group names you entered earlier.
Files per group:
flow.sample.condition
Blood_WT Blood_Areg Lymphoid_WT Lymphoid_KO Tissue_WT Tissue_KO GALT_WT
4 4 24 24 24 24 8
GALT_KO
8
Events per group:
WT Blood AregKO Blood WT Lymphoid AregKO Lymphoid WT Tissue
119432 92147 430479 433066 62293
AregKO Tissue WT GALT AregKO GALT
28968 644426 333020
Events per sample:
Blood_WT.01 Blood_WT.02 Blood_WT.03 Blood_WT.04 Blood_Areg.01
33985 39021 22511 23915 21606
Blood_Areg.02 Blood_Areg.03 Blood_Areg.04 Lymphoid_WT.01 Lymphoid_WT.02
17352 23200 29989 4129 3648
Lymphoid_WT.03 Lymphoid_WT.04 Lymphoid_WT.05 Lymphoid_WT.06 Lymphoid_WT.07
4675 5246 20063 22924 17588
Lymphoid_WT.08 Lymphoid_WT.09 Lymphoid_WT.10 Lymphoid_WT.11 Lymphoid_WT.12
19958 36454 23070 21742 45168
Lymphoid_WT.13 Lymphoid_WT.14 Lymphoid_WT.15 Lymphoid_WT.16 Lymphoid_WT.17
9283 6271 5702 4870 16387
Lymphoid_WT.18 Lymphoid_WT.19 Lymphoid_WT.20 Lymphoid_WT.21 Lymphoid_WT.22
14734 15390 18718 19120 34031
Lymphoid_WT.23 Lymphoid_WT.24 Lymphoid_KO.01 Lymphoid_KO.02 Lymphoid_KO.03
29424 31884 9131 12098 7599
Lymphoid_KO.04 Lymphoid_KO.05 Lymphoid_KO.06 Lymphoid_KO.07 Lymphoid_KO.08
9898 18815 16401 14641 18954
Lymphoid_KO.09 Lymphoid_KO.10 Lymphoid_KO.11 Lymphoid_KO.12 Lymphoid_KO.13
28143 22842 28758 13311 6215
Lymphoid_KO.14 Lymphoid_KO.15 Lymphoid_KO.16 Lymphoid_KO.17 Lymphoid_KO.18
10474 7048 9311 18193 15612
Lymphoid_KO.19 Lymphoid_KO.20 Lymphoid_KO.21 Lymphoid_KO.22 Lymphoid_KO.23
19448 22568 27586 28444 37121
Lymphoid_KO.24 Tissue_WT.01 Tissue_WT.02 Tissue_WT.03 Tissue_WT.04
30455 1013 890 1396 1019
Tissue_WT.05 Tissue_WT.06 Tissue_WT.07 Tissue_WT.08 Tissue_WT.09
9560 6062 2537 2747 2618
Tissue_WT.10 Tissue_WT.11 Tissue_WT.12 Tissue_WT.13 Tissue_WT.14
1860 256 517 856 3382
Tissue_WT.15 Tissue_WT.16 Tissue_WT.17 Tissue_WT.18 Tissue_WT.19
540 7273 12504 4839 1020
Tissue_WT.20 Tissue_WT.21 Tissue_WT.22 Tissue_WT.23 Tissue_WT.24
734 149 353 123 45
Tissue_KO.01 Tissue_KO.02 Tissue_KO.03 Tissue_KO.04 Tissue_KO.05
892 3002 1456 1433 13
Tissue_KO.06 Tissue_KO.07 Tissue_KO.08 Tissue_KO.09 Tissue_KO.10
4311 3489 211 268 327
Tissue_KO.11 Tissue_KO.12 Tissue_KO.13 Tissue_KO.14 Tissue_KO.15
561 787 963 3375 881
Tissue_KO.16 Tissue_KO.17 Tissue_KO.18 Tissue_KO.19 Tissue_KO.20
2464 471 967 210 1564
Tissue_KO.21 Tissue_KO.22 Tissue_KO.23 Tissue_KO.24 GALT_WT.01
157 435 573 158 214634
GALT_WT.02 GALT_WT.03 GALT_WT.04 GALT_WT.05 GALT_WT.06
166224 84693 166711 4518 3335
GALT_WT.07 GALT_WT.08 GALT_KO.01 GALT_KO.02 GALT_KO.03
620 3691 67252 66920 60885
GALT_KO.04 GALT_KO.05 GALT_KO.06 GALT_KO.07 GALT_KO.08
108209 12988 11143 2197 3426
We found these data files matching your groups.
If this doesn't meet your expectations, you should start over
and double-check your file names vis-a-vis your group names.
Please set the number of cells (events) you'd like to use for the analysis.
This will be set as a maximum number per file, so if you set it at 2000
but you only have 500 in some samples, all 500 will be used.
The more data you analyze, the longer it will take. If you aren't sure,
maybe try for a total of no more than 100,000 (for example, 2 groups
with 5 samples per group and 10000 cells/sample gives 100000 total.)
Please enter the number without punctuation.
For your analysis, please enter a maximum number of cells you'd like to
analyze per sample. For samples with fewer cells than this number, all
cells will be used.
5000
Do you want overlays of every marker on your tSNE and UMAP projections as well as plotting clusters?
Generating many plots can be slow with lots of cells.
1: Yes
2: No
1
Please select two groups for the T-REX analysis of under- and over-represented regions
1 WT Blood
2 AregKO Blood
3 WT Lymphoid
4 AregKO Lymphoid
5 WT Tissue
6 AregKO Tissue
7 WT GALT
8 AregKO GALT
Enter relevant number(s), separated by commas
Ranges such as 3:7 may be specified)
(Enter 0 for none)
5,6
Do you want to run the crossentropy statistical test on your tSNE and UMAP projections?
It can be slow if you have lots of events, but is a powerful tool.
1: Yes
2: No
1
Please enter the model, disease or experimental system you're working with.
For example, Alzheimer's Disease, IL-2 therapy, tissue residency...
tissue residency
Setting color palette...
Creating output folders...
Selecting data for analysis...
Move to the next section.
Choose between Phenograph (recommended) and FlowSOM clustering approaches.
The script will automatically name the clusters for you based on the best match in the cell database spreadsheet. You should check the validity of these names by looking at the density plots for the clusters. If you don’t get good results with this automated naming, check whether the cell types you are looking for are covered in the database. If they are not, add them with the appropriate positive and negative expression markers.
You’ll get the option to rename any clusters.
source( file.path( fcs.src.dir, "flowcytoscript_clustering.r") )
As the first part of the analysis, the script will cluster your cells into groups.
In general, we recommend using Phenograph because it is fast,
does not require guessing about how many clusters there should be,
and accurately subclusters real cell types in complex mixtures. However,
Phenograph sometimes overclusters, particularly in cases with lots of
homogeneous cells.
If you prefer to use FlowSOM, you'll need to decide how many clusters you want to find.
Please choose your clustering approach.
1: Phenograph
2: FlowSOM
1
Clustering data with Phenograph
[1] 32
Next, the script will try to identify and name the cell types present in every cluster.
For this to work best, you'll need to enter three pieces of information:
1) Whether you're using human or mouse cells
2) Which tissues you're using
3) If you've pre-selected only certain cell types, which cells those are.
Please consider the automated naming as a guide, and review the names by checking the heatmaps
and density plots for the clusters. If you don't see the correct cell types being identified,
check the instructions for cluster naming and add your cell type definitions to the database spreadsheet.
Please select the species your cells come from.
1: Mouse
2: Human
1
New names:
Select the tissue or tissues your cells come from.
1 Immune
2 Lung
3 Liver
4 Skin
5 Brain
Enter relevant number(s), separated by commas
Ranges such as 3:7 may be specified)
(Enter 0 for none)
1
If you have pre-gated on specific cell types, please them here.
If you're using all viable cells, select 1.
1 All
2 T cell
3 ab T cell
4 gd T cell
5 CD4
6 CD8
7 CD4 Tconv
8 CD4 Treg
9 Act CD4 Tconv
10 Act CD4 Treg
11 CD8 Tconv
12 B cell
13 Transitional B cell
14 ILC
15 DC
16 Lineage-neg
Enter relevant number(s), separated by commas
Ranges such as 3:7 may be specified)
(Enter 0 for none)
6
New names:
Plotting histograms for each marker for samples...
Plotting histograms for each marker for clusters...
Do you want to rename any clusters?
1 Naïve CD8 CCR9
2 CD8 TRM CD103 CD69 Helios
3 CD8 TRM
4 Naïve CD8 Ly.6C CCR9
5 Act CD8 CD44
6 CD8 TRM ICOS CD69
7 Naïve CD8 RORgT Ki67 Neuropilin GATA.3
8 Naïve CD8 Ly.6C
9 Naïve CD8 CD103
10 Memory CD8 Ly.6C
11 Naïve CD8 Ly.6C
12 Naïve CD8 CCR9 CD103
13 Act CD8 Ly.6C
14 CD8 SLEC CD62L CXCR3
15 Memory CD8 Ly.6C Ki67 KLRG1
16 CD8 SLEC
17 Naïve CD8 Ly.6C CD69
18 Memory CD8 Ly.6C Helios KLRG1
19 Naïve CD8 CCR2 CCR9 CD103
20 CD8 TRM PD.1 CD69 Helios
21 CD8 TRM CD103 CD69 Helios
22 Memory CD8 KLRG1
23 CD8 TRM PD.1 CD69
24 Memory CD8 Ly.6C CD69 KLRG1
25 CD8 TRM CD103 Ki67 ICOS CD44
Enter relevant number(s), separated by commas
Ranges such as 3:7 may be specified)
(Enter 0 for none)
21
CD8 Treg
Your clusters will be named as follows:
Naïve CD8 CCR9
CD8 TRM CD103 CD69 Helios
CD8 TRM
Naïve CD8 Ly.6C CCR9
Act CD8 CD44
CD8 TRM ICOS CD69
Naïve CD8 RORgT Ki67 Neuropilin GATA.3
Naïve CD8 Ly.6C
Naïve CD8 CD103
Memory CD8 Ly.6C
Naïve CD8 Ly.6C
Naïve CD8 CCR9 CD103
Act CD8 Ly.6C
CD8 SLEC CD62L CXCR3
Memory CD8 Ly.6C Ki67 KLRG1
CD8 SLEC
Naïve CD8 Ly.6C CD69
Memory CD8 Ly.6C Helios KLRG1
Naïve CD8 CCR2 CCR9 CD103
CD8 TRM PD.1 CD69 Helios
CD8 Treg
Memory CD8 KLRG1
CD8 TRM PD.1 CD69
Memory CD8 Ly.6C CD69 KLRG1
CD8 TRM CD103 Ki67 ICOS CD44
Are you happy with the cluster names?
1: Yes
2: No
1
Exporting cluster counts and percentages as spreadsheets...
Plotting histograms for each marker for clusters...
Proceed to the next section.
You can take a break while it runs, although it may only be a couple of minutes.
dmrd.data.n
[1] 420971
FlowCytoScript analyzed your data on tissue residency in
the Immune system.26 channels were included
and clustering was performed with Phenograph on
420971 cells. 25 clusters were found,
annotated and 3 of these were significantly different
between Tissue_WT and Tissue_KO.
tSNE, UMAP and PCA analyses were performed, and heatmaps, histograms
and expression density plots were generated. Statistical analysis on
marker expression and cluster frequencies was performed, and the results
can be found in the ./marker_stats/ folder.
These are the channels that were used in the analysis: Foxp3, T.bet, IRF4, Ki67, CCR9, CD95, Ly.6C, CD103, NK1.1, CTLA.4, CD62L, CXCR6, CCR2, CD44, ICOS, RORgT, PD.1, CXCR3, TNFRII, CD69, CD25, ST2, GATA.3, Neuropilin, KLRG1, Helios
The analysis completed in 69.49 minutes. Setting up the analysis and clustering took 123.18 minutes.
tSNE visualization of the data
UMAP visualization of the data with clusters in colored overlay
Principal Components Analysis based on marker expression
Principal Components Analysis by cluster distribution
Heatmap of marker expression by sample
Marker expression by sample
Heatmap of marker expression by cluster
Marker expression by cluster
Sample histogram
Changes in distribution: Tissue_WT versus Tissue_KO